Search CORE

33 research outputs found

Adaptive Algorithms for Automated Processing of Document Images

Author: Agrawal Mudit
Publication venue
Publication date: 01/01/2011
Field of study

Large scale document digitization projects continue to motivate interesting document understanding technologies such as script and language identification, page classification, segmentation and enhancement. Typically, however, solutions are still limited to narrow domains or regular formats such as books, forms, articles or letters and operate best on clean documents scanned in a controlled environment. More general collections of heterogeneous documents challenge the basic assumptions of state-of-the-art technology regarding quality, script, content and layout. Our work explores the use of adaptive algorithms for the automated analysis of noisy and complex document collections. We first propose, implement and evaluate an adaptive clutter detection and removal technique for complex binary documents. Our distance transform based technique aims to remove irregular and independent unwanted foreground content while leaving text content untouched. The novelty of this approach is in its determination of best approximation to clutter-content boundary with text like structures. Second, we describe a page segmentation technique called Voronoi++ for complex layouts which builds upon the state-of-the-art method proposed by Kise [Kise1999]. Our approach does not assume structured text zones and is designed to handle multi-lingual text in both handwritten and printed form. Voronoi++ is a dynamically adaptive and contextually aware approach that considers components' separation features combined with Docstrum [O'Gorman1993] based angular and neighborhood features to form provisional zone hypotheses. These provisional zones are then verified based on the context built from local separation and high-level content features. Finally, our research proposes a generic model to segment and to recognize characters for any complex syllabic or non-syllabic script, using font-models. This concept is based on the fact that font files contain all the information necessary to render text and thus a model for how to decompose them. Instead of script-specific routines, this work is a step towards a generic character and recognition scheme for both Latin and non-Latin scripts

Digital Repository at the University of Maryland

The IDENTIFY study: the investigation and detection of urological neoplasia in patients referred with suspected urinary tract cancer - a multicentre observational study

Author: Abdullah Nasreen
Abroaf Ahmed
Acher Peter
Adams Robert
Adasonla Kelvin
Adimonye Anthony
Adwin Zainal
Ager Michael
Agrawal Sachin
Ahmad Adnan
Ahmed Hashim
Akaln Mustafa Kaan
Al?Ibraheem Nihad
Aldiwani Mohammed
Almpanis Stephanos
Alonso Cristina Plaza
Andres Rosado Mario
Aning Jonathan
Antón-Juanilla Marta
Appanna Timson
Assmus Mark A.
Austin Tomas
Ayres Ben
Bagley Joseph
Balderas Olga
Barcelos André
Bardet Florian
Barratt Rachel
Bass Edward
Bdesha Amar
Beder Daniel
Bedi Nishant
Bele Uros
Benitez Cayo Augusto Estigarribia
Bennett Adam
Bhatt Nikita
Bi Hai
Black Peter C
Blick Chris
Boltri Matteo
Bonnin Thierry
Boxall Nicholas E.
Brittain James
Brophy Tom
Brown Christian
Brown Kevin
Browne Clíodhna
Bullock Nicholas
Burden Helena
Burnhope Tara
Cadena Iván Revelo
Campain Nicholas
Capoun Otakar
Carrion Diego M
Castillo Elba Canelon
Catto James
Cebola Ana
Chahal Rohit
Challacombe Ben
Chan Luke
Chau Edwin
Chaudry Aasem
Chin Yew Fung
Chippagiri Arvinda
Chlosta Piotr L
Chu Timothy Shun Man
Chuchu Naomi
Cimarra Fernando
Claps Francesco
Clark Jennifer
Clarke Holly
Clarke Holly
Clarke Laurence
Cohen Daniel
Cooper Meghan
Cormier Luc
Cortes Victor Parejo
Courcier Jean
Crawford-Smith Hugh
Crespo-Atín Víctor
Crockett Matthew
Crockett Matthew
Cruz Ricardo
Czech Anna K
Cózar-Olmo Jose Manuel
Das Arighno
Davenport Kim
David Rotimi
Day Elizabeth
Deeks Jon
Deem Samuel
Dell'Atti Lucio
Derbyshire Laura
Desai Ankit
Desouky Elsayed
Dhanasekaran Ananda Kumar
Dhera Karishma
Dinneen Eoin
Dowling Catherine
Downey Alison P.
Drake Tamsin
During Vinnie
Ebur Andrea
Edison Eric
Ellis Ricky
Ellul Tom
Emberton Mark
Erotocritou Paul
Esler Rachel
Fen Koo Hui
Feneley Mark
Fiala Vojtech
Figueiredo Arnaldo
Filtekin Yigit
Forster Luke
Frankel Jason
Freire Maria José
Gadiyar Neha
Gaines Emily
Gallagher Kevin M.
Gallego Maria Camacho
Gallegos Christopher
Galosi Andrea B
Gao Chuanyu
García de Jalón Ángel
Garg Tullika
Gauhar Vineet
Gillams Kathryn
Gillams Kathryn
Glaser Zachary A
Gnanapragasam Vincent
Goldsmith Louise
Gontero Paolo
Goonewardene Sanchia
Gordon Daniel
Gordon Danny
Green James
Gresty Helena
Grigorakis Alkiviadis
Gronostaj Katarzyna
Gómez Rivas Juan
Hale Nathan
Hamid Syed Sami
Haroon Usman
Hawkins Rosalyn
Hawlina Simon
He Ming
Hellawell Giles
Hernandez Juan
Hernando Jamie Borrego
Herranz-Yagüe José Antonio
Hessell Robert
Higgs Claire
Ho Cherrie Wing Yin
Hobbs Catherine
Hori Satoshi
Horie Shigeo
Houlton Kathleen
Hrouda David
Hsu Ray
Iacovou John
Ibrahim Ibrahim
Ibrahim Youssed
Inder Shakeel Mohammud
Irani Jacques
Jain Sunjay
Jarimba Roberto
Jefferies Matthew
Jelski Joseph
Jones Jennifer
Joniau Steven
Kalsi Jas
Karsza Dávid
Kasivisvanathan Veeru
Kata Slawomir Grzegorz
Kelly John
Khadhouri Sinan
Khalid Raihan
Khan Shahid
Khawaja Faizan
Kilic Enes
Kitamura Kosuke
Knight Allen
Kocher Neil
Kond?a Andra?
Kouli Omar
Kovács Gábor
Kulkarni Meghana
Kum Francesca
Kumar Vivek
Kumaradeevan Jeevan
Kynaston Howard
La Montagna Giuseppe
Lai Billy
Lal Asim A
Lam Chon Meng
Lam Gitte
Lam Wayne
Lau David H. W.
Lau David Hua Wu
Leask Jamie
Lebacle Cédric
Lee Taeweon
Lee Taeweon
Lehman Kathleen
Leminski Artur
Lenart Gordan
Li Mo
Liew Matthew
Lillaz Beatrice
Lloyd Aimee
Lobo Niyati
Lopes Sofia Pinheiro
Lyons Hannah
Ma Lulin
MacKay Alison
MacKenzie Kenneth R.
MacLennan Graeme
MacLennan Graeme
Mahmalji Wasim
Mains Edward
Mainwaring Anna
Mak David
Mallett Susan
Manecksha Rustom P
Mangat Reshma
Manjavacas Pablo Oteo
Mannas Miles P.
Marathe Shekhar
Marchiñena Patricio Garcia
Mariappan Paramananthan
Marra Giancarlo
Martin-Way David
Martinez Levin
Martinez-Piñeiro Luis
Matanhelia Mudit
Maw Jonny
Mazzoli Simone
McCann Conor
McConkey Robert
McGrath John S.
McKay Alastair
Meeks Joshua
Minervini Andrea
Mistry Kiki
Moore Madeline
Moore Sacha
Morris Steve
Morton Lawrie
Mostafid Hugh
Mount Chloe
Muilwijk Tim
Mukhtar Bashir
Murtagh Kevin
Nagle Amy
Nambiar Arjun
Nellensteijn Brechtje
Ng ChiFai
Nielsen Matthew
Nikles Sven
Nkwam Nkwam
Nolazco Jose Ignacio
Norris Joseph
Nyanhongo Donald
O'Meara Sorcha
O'Rourke John
Olivier Jonathan
Olivier Jonathan
Omran Breish Mohamed
Oo Aye Moh Moh
Oomen Robert J.A.
Osei?Bonsu Peter K
Otaola-Arca Hugo
Ouzaid Idir
Palagonia Erika
Papworth Emma
Paramore Louise
Parker Sidney
Parson Sian
Pasha Muhammad
Patel Dhruv
Patel Trushar
Pavan Nicola
Peters Francesca
Phan Yih Chyn
Pira Matea
Pita Hernado Rios
Pizzuto Giuseppe
Planelles Paula
Plo Teresa Cabañuz
Poves Victoria Capapé
Puche-Sanz Ignacio
Qin Zijian
Rai Sonpreet
Raman Jay D
Ramos Sónia
Randhawa Karen
Raveenthiran Sheliyan
Rico Luis
Rintoul-Hoad Sophie
Ristau Benjamin
Ritchie Robert
Rivero Marta Viridiana Muñoz
Rowe Tracey
Russell Andrew
Sahibzada Iqbal
Sangaralingam Shanthi
Schneider Alexandre
Schreiter Brielle
Selph John P
Sengupta Shomik
Serag Hosam
Shah Taimur T.
Sharma Abhishek
Sherwood Benedict
Shrotri Nitin
Silva Alberto
Simmons Lucy
Simpson Helen
Smith Peter
Smrkolj Toma
Stamirowski Remigiusz
Sukhu Troy
Suliman Ahmed M
Suthaharan Denula
Swami Satchi Kuchibhotla
Sweeney Paul
Takwoingi Yemisi
Tallè Matteo
Tanasescu George
Tanasescu Geroge
Tarin Mohamed
Tasso Giovanni
Teoh Jeremy YuenChun
Testa Joseph
Thangasamy Isaac
Tinay Ilker
Toma Tarq
Tomakovi Igor
Toniolo Jason
Trail Matthew
Trombetta Carlo
Turner Stacey
Tweedle James
Udzik Jakub
Ul Ain Qurrat
Uzan Audrey
Uçar Taha
Uçar Taha
Venturini Stefano
Villers Arnauld
Voulgaris Athanasios M
Vásquez Juan Luis
Warren Hannah
Webb Andrew
Wilby Daniel
Willemse Peter-Paul Michiel
Williams Simon
Wollin Tim
Wong Albert
Xylinas Evanguelos
Y?ld?r?m As?f
Yan Shahzad Sylvia
Younis Ayman
Yuruk Emrah
Zainuddin Zulkifli
Zelhof Bachar
Zimmermann Eleanor F.
Zotter Zsuzsanna
Çakurlu Turhan
Özgür Günal
Østergren Peter
Publication venue: 'Wiley'
Publication date: 31/10/2021
Field of study

Objective To evaluate the contemporary prevalence of urinary tract cancer (bladder cancer, upper tract urothelial cancer [UTUC] and renal cancer) in patients referred to secondary care with haematuria, adjusted for established patient risk markers and geographical variation. Patients and Methods This was an international multicentre prospective observational study. We included patients aged ≥16 years, referred to secondary care with suspected urinary tract cancer. Patients with a known or previous urological malignancy were excluded. We estimated the prevalence of bladder cancer, UTUC, renal cancer and prostate cancer; stratified by age, type of haematuria, sex, and smoking. We used a multivariable mixed-effects logistic regression to adjust cancer prevalence for age, type of haematuria, sex, smoking, hospitals, and countries. Results Of the 11 059 patients assessed for eligibility, 10 896 were included from 110 hospitals across 26 countries. The overall adjusted cancer prevalence (n = 2257) was 28.2% (95% confidence interval [CI] 22.3–34.1), bladder cancer (n = 1951) 24.7% (95% CI 19.1–30.2), UTUC (n = 128) 1.14% (95% CI 0.77–1.52), renal cancer (n = 107) 1.05% (95% CI 0.80–1.29), and prostate cancer (n = 124) 1.75% (95% CI 1.32–2.18). The odds ratios for patient risk markers in the model for all cancers were: age 1.04 (95% CI 1.03–1.05; P < 0.001), visible haematuria 3.47 (95% CI 2.90–4.15; P < 0.001), male sex 1.30 (95% CI 1.14–1.50; P < 0.001), and smoking 2.70 (95% CI 2.30–3.18; P < 0.001). Conclusions A better understanding of cancer prevalence across an international population is required to inform clinical guidelines. We are the first to report urinary tract cancer prevalence across an international population in patients referred to secondary care, adjusted for patient risk markers and geographical variation. Bladder cancer was the most prevalent disease. Visible haematuria was the strongest predictor for urinary tract cancer

Online Research @ Cardiff

Clutter Noise Removal in Binary Document Images

Author: David Doermann
Mudit Agrawal
Publication venue
Publication date: 01/01/2009
Field of study

The paper presents a clutter detection and removal algorithm for complex document images. The distance transform based approach is independent of clutter’s position, size, shape and connectivity with text. Features are based on a residual image obtained by analysis of the distance transform and clutter elements, if present, are identified with an SVM classifier. Removal is restrictive, so text attached to the clutter is not deleted in the process. The method was tested on a collection of degraded and noisy, machine-printed and handwritten Arabic and English text documents. Results show pixel-level accuracies of 97.5 % and 95 % for clutter detection and removal respectively. This approach was also extended with a noise detection and removal model for documents having a mix of clutter and salt-n-pepper noise

CiteSeerX

Crossref

Voronoi++: A Dynamic Page Segmentation approach based on Voronoi and Docstrum Features

Author: David Doermann
Mudit Agrawal
Publication venue
Publication date: 01/01/2009
Field of study

This paper presents a dynamic approach to document page segmentation. Current page segmentation algorithms lack the ability to dynamically adapt local variations in the size, orientation and distance of components within a page. Our approach builds upon one of the best algorithms, Kise et. al. work based on Area Voronoi Diagrams [10], which adapts globally to page content to determine algorithm parameters. In our approach, local thresholds are determined dynamically based on parabolic relations between components, and Docstrum based angular and neighborhood features are integrated to improve accuracy. Zone-based evaluation was performed on four sets of printed and handwritten documents in English and Arabic scripts and an increase of 33 % in accuracy is reported

CiteSeerX

Crossref